作为重要的数据挖掘技术,高公用事业项目集挖掘(HUIM)用于找出有趣但隐藏的信息(例如,利润和风险)。 HUIM已广泛应用于许多应用程序方案,例如市场分析,医疗检测和网络点击流分析。但是,大多数以前的HUIM方法通常忽略项目集中项目之间的关系。因此,在Huim中发现了许多无关的组合(例如,\ {Gold,Apple \}和\ {笔记本,书籍\})。为了解决这一限制,已经提出了许多算法来开采相关的高公用事业项目集(Cohuis)。在本文中,我们提出了一种新型算法,称为Itemset实用性最大化,相关度量(COIUM),该算法既考虑较强的相关性,又考虑了项目的有利可图。此外,新型算法采用数据库投影机制来降低数据库扫描的成本。此外,利用了两种上限和四种修剪策略来有效修剪搜索空间。并使用一个名为“实用程序”的简洁阵列结构来计算和存储在线性时间和空间中所采用的上限。最后,对密集和稀疏数据集的广泛实验结果表明,在运行时和内存消耗方面,COIUM显着优于最新算法。
translated by 谷歌翻译
在环境中,从天气预报到财务预测的政治预测,未来二元成果的概率估计通常随着时间的推移而发展。例如,随着新信息可用的时间,特定日期的估计可能性在特定日变化。鉴于这种概率路径的集合,我们介绍了一个贝叶斯框架 - 我们称之为高斯潜在信息鞅,或粘合 - 用于模拟动态预测的结构随着时间的推移。例如,假设一个星期下雨的可能性是50%,并考虑两个假设情景。首先,人们希望预测同样可能成为明天的25%或75%;第二,人们预计预测将在未来几天保持不变。一个时间敏感的决策者可以在后一种情况下立即选择一个行动方案,但可能会推迟他们在前者的决定,知道新信息迫在眉睫。我们通过假设根据信息流的潜在进程的预测更新来模拟这些轨迹,从历史数据推断出来。与时间序列分析的一般方法相比,这种方法保留了诸如Martingale结构的概率路径的重要属性,以及适当的挥发性,并且更好地量化了概率路径周围的未来不确定性。我们表明光泽优于三种流行的基线方法,产生了由三种不同度量测量的更高估计的后验概率路径分布。通过阐明时间随着时间的推移来解除预测的动态结构,希望能帮助个人做出更明智的选择。
translated by 谷歌翻译
机器学习技术的兴起激发了电子设计自动化(EDA)中应用的繁荣,有助于提高芯片设计中的自动化程度。然而,手动制作的机器学习模型需要广泛的人类专业知识和巨大的工程努力。在这项工作中,我们利用神经结构搜索(NAS)来自动开发高质量的神经架构进行可排卵预测,这有助于引导细胞放置到可路由解决方案。我们的搜索方法支持各种操作和高度灵活的连接,导致架构与所有先前的人工制作模型显着不同。大型数据集上的实验结果表明,我们的自动生成神经架构明显优于多个代表手动制作的解决方案。与手动制作型号的最佳案例相比,NAS产生的模型达到了5.85%的kendall的$ \ tau $,以预测DRC违规的网数和ROC曲线(ROC-AUC)在DRC热点检测下的2.12%面积。此外,与人工制作的模型相比,易于花数周开发,我们的高效NAS方法只需0.3天即可完成整个自动搜索过程。
translated by 谷歌翻译
We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant $t$ such that the overall release is user-level $\varepsilon$-DP and has the following error guarantee: Denoting by $M_t$ the maximum number of samples contributed by a user, as long as $\tilde{\Omega}(1/\varepsilon)$ users have $M_t/2$ samples each, the error at time $t$ is $\tilde{O}(1/\sqrt{t}+\sqrt{M}_t/t\varepsilon)$. This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.
translated by 谷歌翻译
One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .
translated by 谷歌翻译
A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that - across 6 architectures trained with 4 algorithms on massive datasets - they exhibit little compositionality. To arrive at this conclusion, we introduce a new compositionality evaluation benchmark CREPE which measures two important aspects of compositionality identified by cognitive science literature: systematicity and productivity. To measure systematicity, CREPE consists of three test datasets. The three test sets are designed to test models trained on three of the popular training datasets: CC-12M, YFCC-15M, and LAION-400M. They contain 385K, 385K, and 373K image-text pairs and 237K, 210K, and 178K hard negative captions. To test productivity, CREPE contains 17K image-text pairs with nine different complexities plus 246K hard negative captions with atomic, swapping, and negation foils. The datasets are generated by repurposing the Visual Genome scene graphs and region descriptions and applying handcrafted templates and GPT-3. For systematicity, we find that model performance decreases consistently when novel compositions dominate the retrieval set, with Recall@1 dropping by up to 8%. For productivity, models' retrieval success decays as complexity increases, frequently nearing random chance at high complexity. These results hold regardless of model and training dataset size.
translated by 谷歌翻译
The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations due to differences in hardware and acquisition parameters. In recent years, MR harmonization using image synthesis with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both Tw-weighted and T2-weighted images must be available), which limits their applicability. Third, existing methods generally are sensitive to imaging artifacts. In this paper, we present a novel approach, Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), to address these three issues. We first propose an anatomy fusion module that enables HACA3 to respect the anatomical differences between MR contrasts. HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability of HACA3 on downstream tasks with diverse MR datasets acquired from 21 sites with different field strengths, scanner platforms, and acquisition protocols.
translated by 谷歌翻译
Machine learning models are known to be susceptible to adversarial perturbation. One famous attack is the adversarial patch, a sticker with a particularly crafted pattern that makes the model incorrectly predict the object it is placed on. This attack presents a critical threat to cyber-physical systems that rely on cameras such as autonomous cars. Despite the significance of the problem, conducting research in this setting has been difficult; evaluating attacks and defenses in the real world is exceptionally costly while synthetic data are unrealistic. In this work, we propose the REAP (REalistic Adversarial Patch) benchmark, a digital benchmark that allows the user to evaluate patch attacks on real images, and under real-world conditions. Built on top of the Mapillary Vistas dataset, our benchmark contains over 14,000 traffic signs. Each sign is augmented with a pair of geometric and lighting transformations, which can be used to apply a digitally generated patch realistically onto the sign. Using our benchmark, we perform the first large-scale assessments of adversarial patch attacks under realistic conditions. Our experiments suggest that adversarial patch attacks may present a smaller threat than previously believed and that the success rate of an attack on simpler digital simulations is not predictive of its actual effectiveness in practice. We release our benchmark publicly at https://github.com/wagner-group/reap-benchmark.
translated by 谷歌翻译
Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io.
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译